Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent

نویسندگان

Emily Pitler

Ken Ward Church

چکیده

Three methods are proposed to classify queries by intent (CQI), e.g., navigational, informational, commercial, etc. Following mixed-initiative dialog systems, search engines should distinguish navigational queries where the user is taking the initiative from other queries where there are more opportunities for system initiatives (e.g., suggestions, ads). The query intent problem has a number of useful applications for search engines, affecting how many (if any) advertisements to display, which results to return, and how to arrange the results page. Click logs are used as a substitute for annotation. Clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. We start with a simple Naı̈ve Bayes baseline that works well when there is plenty of training data. When training data is less plentiful, we back off to nearby URLs in a click graph, using a method similar to Word-Sense Disambiguation. Thus, we can infer that designer trench is commercial because it is close to www.saksfifthavenue.com, which is known to be commercial. The baseline method was designed for precision and the backoff method was designed for recall. Both methods are fast and do not require crawling webpages. We recommend a third method, a hybrid of the two, that does no harm when there is plenty of training data, and generalizes better when there isn’t, as a strong baseline for the CQI task. 1 Classify Queries By Intent (CQI) Determining query intent is an important problem for today’s search engines. Queries are short (consisting of 2.2 terms on average (Beitzel et al., 2004)) and contain ambiguous terms. Search engines need to derive what users want from this limited source of information. Users may be searching for a specific page, browsing for information, or trying to buy something. Guessing the correct intent is important for returning relevant items. Someone searching for designer trench is likely to be interested in results or ads for trench coats, while someone searching for world war I trench might be irritated by irrelevant clothing advertisements. Broder (2002) and Rose and Levinson (2004) categorized queries into those with navigational, informational, and transactional or resourceseeking intent. Navigational queries are queries for which a user has a particular web page in mind that they are trying to navigate to, such as greyhound bus. Informational queries are those like San Francisco, in which the user is trying to gather information about a topic. Transactional queries are those like digital camera or download adobe reader, where the user is seeking to make a transaction or access an online resource. Knowing the intent of a query greatly affects the type of results that are relevant. For many queries, Wikipedia articles are returned on the first page of results. For informational queries, this is usually appropriate, as a Wikipedia article contains summaries of topics and links to explore further. However, for navigational or transactional queries, Wikipedia is not as appropriate. A user looking for the greyhound bus homepage is probably not interested in facts about the company. Similarly, someone looking to download adobe reader will not be interested in Wikipedia’s description of the product’s history. Conversely, for informational queries, Wikipedia articles tend to be appropriate while advertisements are not. The user searching for world war I trench might find the Wikipedia article on trench warfare useful, while he is prob-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding Users Intent by Deducing Domain Knowledge Hidden in Web Search Query Keywords

Search Engines are used by people on a daily basis to retrieve information from the web. When an ambiguous word is present in a query, specific sense of the keyword is not considered during the search process. Search engines return a large amount of web pages as results from all the possible contexts. Users tend to browse only few pages. Improving quality of retrieved results is a challenge and...

متن کامل

Using crowd-sourcing for query classification and analysis

In order to gain a better understanding of users and their intent behind web searching activities, first steps involve the analysis of the query submitted by the user and correct categorical classification of the query as a input for further analyses. Natural Language Processing (NLP) is an area with many inaccuracies for problem areas in Word Sense Disambiguation (WSD) and terms detected that ...

متن کامل

A Genetic Fuzzy Semantic Web Search Agent Using Granular Semantic Trees for Ambiguous Queries

For most Web searching applications, queries are commonly ambiguous because words or phrases have different linguistic meanings for different Web users. The conventional keyword-based search engines cannot disambiguate queries to provide relevant results matching Web users’ intents. Traditional Word Sense Disambiguation (WSD) methods use statistic models or ontology-based knowledge systems to m...

متن کامل

Ontology Based Query Expansion Using Word Sense Disambiguation

The existing information retrieval techniques do not consider the context of the keywords present in the user’s queries. Therefore, the search engines sometimes do not provide sufficient information to the users. New methods based on the semantics of user keywords must be developed to search in the vast web space without incurring loss of information. The semantic based information retrieval te...

متن کامل

A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent

نویسندگان

چکیده

منابع مشابه

Understanding Users Intent by Deducing Domain Knowledge Hidden in Web Search Query Keywords

Using crowd-sourcing for query classification and analysis

A Genetic Fuzzy Semantic Web Search Agent Using Granular Semantic Trees for Ambiguous Queries

Ontology Based Query Expansion Using Word Sense Disambiguation

A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

عنوان ژورنال:

اشتراک گذاری